arXiv:1505.05660vl [stat.ME] 21 May 2015 


A factor model approach for the joint segmentation with 
between-series correlation 

X. Collilieux®’^, E. Lebarbier^’*^’*, S. Robin'^’'^ 


(“) IGN LAREG, Univ Paris Diderot, Paris, France 

(^) Observatoire de Paris, SYRTE, GNRS, UPMG, Paris, France 

p) AgroParisTech, UMR 518 MIA, Paris, France 

('^) INRA, UMR 518 MIA, Paris, France 

(*) corresponding author: lebarbieagroparistech.fr 


Abstract. We consider the segmentation of set of correlated time-series due e.g. to some spatial structure. 
We propose to model the between-series dependency with a factor model. This modeling allows us to use the 
dynamic programming algorithm for the inference of the breakpoints, which remains the most efficient strategy. 
We also propose a model selection procedure to determine both the number of breakpoints and the number of 
factors. The performance of our proposed procedure is assessed through simulation experiments. An application 
to geodetic data is presented. 
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1 Introduction 

The objective of segmentation methods is to detect abrupt changes in the distribution of a signal. Such 
segmentation problems arise in many areas: in biology for the detection of chromosomal aberrations ([9, 5]), in 
climatology for the detection of changes due to instrumental changes ([2, 6]) or in geodesy for the detection of 
changes in GPS location series either due to instrumental or to environmental changes as earth’s crust shifts 
[ 12 ]. 

In many cases, we have at hand multiple series (several patients or GPS receivers) and it is likely that some 
dependency exist between them. Probe effect in the genomic context [7] or spatial correlation in climatology 
and geodesy context (due to their spatial organization) typically results in such a between-series dependency. 
Dealing with multiple series requires to account for these correlations in order to avoid false breakpoint detection 
(see [8]). 

We consider here the detection of breakpoints in a set of time-series. We are typically interested in the case 
where this data consists of series of measurements observed along time in different locations. Each series is 
supposed to be affected by changes at series-specific breakpoints and the signals observed at each location are 
supposed to be correlated from one series to another. One of the difficulty here is to propose a modeling for 
which an efficient estimation algorithm can be derived. Indeed, the inference of segmentation models requires 
to search over the space of all possible segmentations, which is prohibitive in terms of computational time, 
when performed in a naive way. The Dynamic Programming (DP) strategy is the only one that gives the exact 
solution in a fast way but only applies when the contrast to be optimized is additive with respect to the segments 
(see [1, 2, 9]). In presence of dependency, the contrast (e.g. the log-likelihood) is generally not additive. In some 
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sense, our strategy consists in ’removing’ the dependency so that, at a certain step of the estimation algorithm, 
DP can be applied to transformed data. 

A similar setting is considered in [7] where a variance component model is used to account for the dependency 
between the series. This case consists in the simplest correlation form we can have, that is a uniform correlation 
between all series. Our purpose here is to broad the set of possible dependency structure that the modeling can 
account for. The factor model provides a convenient and efficient way to model covariance matrices, possibly 
limiting the number of parameters. It can be viewed as a generalization of variance components models, where 
the components are unknown and need to be estimated, together with the associated variances. It is based on 
the spectral decomposition of the covariance matrix and has been successfully applied in situations where very 
little is known about the correlation structure (see e.g. [3]). The inference of factor models can be achieved via 
an EM algorithm. 

We present here a general model for correlated Gaussian series, based on a factor model for the covariance 
matrix. We show that some by-product of the EM algorithm can be used to remove the dependency between the 
series. This allows us to combine the EM and DP algorithms together. In terms of model selection, we propose 
a heuristic procedure combining two BIG criteria: the classical BIG for the choice of the number of factors 
and the modified BIG criterion derived by [13] in the context of segmentation for the choice of the number of 
segments. 

The article is organized as follows. In Section 2, we present the proposed model for multiple series which 
is a segmentation model combined with a factor model. The EM algorithm proposed to obtain the maximum 
likelihood estimates is described in Section 3. In Section 4, we introduce a model selection procedure for both 
the number of segments and the number of factors. A simulation study is performed in Section 5 and in Section 6 
we apply our method to geodetic data. 

2 Model 

We consider M series with n points each. We note ytm the observed signal of series m at time t. The total 
number of observations is N = nM. The data are gathered in a matrix Y with dimension [n x M]. For a given 
matrix A, we denote by At its t-th row and by A”* its m-th column. Thus the column Y™ represents whole 
series m, while the row Yt contains the observations at time t in all the series. 

Breakpoints and segments We consider here that each series has its own breakpoints: the mean of the series 
ym _ (Ytm)t=i..n is subject to Km — 1 Specific abrupt changes at breakpoints (t^)fc=o..if„ (with convention 
t™ = 0 and = n) and is constant between two breakpoints within the interval + l;t™]- In the 

following we denote hy K = the total number of segments and n]? = ~ the length of segment 

k from series m (fc = 1,..., Km)- The segmentation model is written as follows: 

Ytm = Mfcm + Ffm Vt G 7™, (1) 

where the M-dimensional error vectors {Ft}t=i..n are supposed to be i.i.d. centered Gaussian and with covari¬ 
ance matrix S. 

Correlations between series We model the dependency between the series via a factor model. More 
precisely, we assume that the dependency across series at each time t is captured by some latent factor Zt, with 
dimension Q, which affects all residual terms through a matrix B: 

Q 

Ftm = Ztqbqm + Etm 'it,m Ft = ZtS' + Et Vt 

9=1 
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where B = (bmq) is a fixed [M x Q] matrix and where the random vectors Zt = {Ztq) are i.i.d., centered, Gaus¬ 
sian with covariance matrix Iq and independent from the random vectors Et = (Etm), which are i.i.d. centered 
Gaussian with diagonal covariance matrix 'S'. Note that assumption Y{Zt) = Ig is necessary for identifiability 
reasons. 

As a result, the covariance matrix S of the Ft writes 

S = BB' + (2) 

In this decomposition, BB' refers to the shared variance and to the specific one. 

An important feature of the factor model is that any M-dimensional covariance matrix can be recovered, 
provided that Q = M — 1. As an example, the classical spatial correlation structure (S)m,m' oc '> where 

d{m, m!) stands for the distance between the locations of series m and m! can be rewritten under the form (2). 
Another interest of the factor model is that a small value Q provides regularization of the covariance matrix. 

Model With the previous decomposition of the variability, the model (1) can be rewritten as a mixed linear 
model: 

Q 

^tm — f^km A ^ ^ Ztqbq^ -j- Etm Vt G . 

9=1 

This linear model can be written with the following matrix form: 

Y = Tfj, +ZB' + E, (3) 

where Y, with size [n x M , stands for the observed data, T is the incidence matrix of breakpoints with size 
[n X K], T™ = Bloc |^fln™ , and T = , /x is the means with size [K x M] (and the mean 

of the segment k for series m) such that /x™ = Bloc ], and /x = [/x^ ij? ... /x''^], Z with size [n x Q] and 
B with size [M x Q], and E with size [n x M], 

The main difference between this model and a classical mixed linear model is that both the incidence matrix T 
and the factor matrix B are unknown. 

Parametrization This detour through the factor model is motivated by an algorithmic issue. Indeed, dealing 
with a covariance matrix which is not diagonal as for example (S)m,m' oc i hampers the direct use of the 

DP algorithm (used for the inference of the breakpoints). As shown in the next section, the reparametrization 
of E as (2) where is diagonal allows us to use it. 

To estimate all the parameters of the model, we proceed as classically in two steps: we estimate the param¬ 
eters (j) = (T, /X, (T^, B), the number of factors Q and the number of segments K being fixed (Section 3), then 
we use a model selection strategy to choose Q and K (Section 4). 

3 Estimation using the EM algorithm 

To estimate the set of parameters (j), we consider the maximum-likelihood procedure. This can be done via 
an EM algorithm, considering that Z represents the missing (or latent) variables [10]. This algorithm is now 
classical for the inference in mixed linear model (see for example [11]). We remind that the key quantity in the 
EM algorithm is the conditional expectation, {logl2(Y|Z; (())[Y} where the complete-data 

log-likelihood is 

log£(Y, Z; cP) = log£(Y|Z; cf,) + log£(Z). 
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Since the distribution of Z does not depend on the parameters <f>, only the first term will be considered which 
writes 

n 

—21og£(Y|Z; (f>) = N\og {2n) + nlog (|’®'|) + E ll^t - Mfc(t) - 

t=i 


then its conditional expectation satisfies 

-2(5(<^;0W) = iVlog(27r)+nlog(|’S'l) 

+ E + Tr , 

t = l 

where is the value of (j) at iteration (h), E^l-} is the expectation calculated with cf) as the parameter value 
and V^j-} the corresponding variance, Z^^^ = {Zt|Y}, and = V^ch) {Zt|Y}. Tr(yl) stands for the 

trace of matrix A and |j 4| for its determinant. 

The EM algorithm is an iterative algorithm which consists in two steps (E-step and M-step) at each iteration. 
At iteration (h -|- 1), we have 

E-step This step consists in the calculation of the conditional expectation Q{(j); which requires the 
conditional moments Z and W. We get 

where = Y* - /xip) • 

M-step This step consists in the estimation of the parameters by maximizing the obtained conditional ex¬ 
pectation. We get 

• Estimation of B: 

t = l 




£=1 




• Estimation of residual covariance: 

= argm|x(3o = 

where =Yt — Mk^j) ~ In the case where ^ = ct^Im, as we consider in the simulation 

study and the application, we get 

^ 2 .(h+i) ^ ^ ^ + Tr 

^ t=l 

and '^-('*+1) = cr 2 ,(?i+i)i^. 
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• Estimation of the segmentation parameters T/x: 


t=i 

n 

= argmin^||yt-(T/x)i||2^(;.+i)j_i, 

t=i 

where Et = Y^ —This last term can be viewed as a correction to remove dependency between 
the series. 'S’ being diagonal, the DP algorithm can be used to obtain the segmentation parameter. In 
particular, we use the two-stages DP proposed by [7, 8] which is a fast version of DP dedicated to the 
joint segmentation of multiple series. In the case where 'S' = the homoskedastic noise case, the 

quantity to be minimized is a residual sum of squares. In the heteroskedastic noise case, the considered 
sum of squares is its weighted version. 


4 Model selection 


Both the number of factors Q and the number of segments K should be estimated. This joint model selection 
issue is not classical and difficulty arises due to the different nature of the parameters at hand. Indeed the 
likelihood function is continuous with respect to the loadings B of the latent vectors, so the classical framework 
for the BIG approximation holds. This is not true for the segmentation parameters and a specific BIG approxi¬ 
mation needs to be derived, as observed by [13]. Furthermore, the two resulting criteria do not share the same 
form, so they can not be easily combined into a single criterion. We propose here a two-step heuristic to select 
these two parameters. 

For a fixed number of segments K, we use the classical BIG criterion 

BICk{Q) = 2E(T/x^, Sq) - Dq log (n). 


where Sg is the covariance matrix for a given Q, T/x^ is the segmentation parameters for a given K, 
E(T/x^,Sq) is the log-likelihood calculated at its maximum for the model with K segments and Q factors, 
and Dq is the number of parameters in a model with Q factors, Dq = Q{2M — Q -I- l)/2 -|- 1. Indeed for the 
variance components, according to the variance decomposition (cf equation (2)), the number of parameters are 
MQ for B and one for cr^. Moreover, with the orthogonality condition on B, only MQ — Q{Q — l)/2 need to 
be estimated. 

The number of factor Q is then selected as 

Qk = argmaxi3/Cic((5). 

Q 


For each Q, we select the number of segments K with the modified BIG 
the joint segmentation of multiple series; 


proposed by [13] and adapted to 


mBICQ{K) 


K-M\ 

J 


log 


SS,u 

2 


+ log 


P 


JV-K 



N-K 



log 


\ S'S'wg(t) j 


. M km 

logn^-(7i'-M) log (IV), 

m=l k=l 
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with 


55wg(?) = ^(Yt - Mfc(t))SQ'(Yt - Mfc(t))', 
t = l 
n 

55an = ^(Yt - Y)S“\Yt - Y)', 

t = l 

55bg(?) = 55all - 55wg(?), 

n™ is the length of segment k in series m (n™ = ~ k'kit) is a vector of size M with the component 

m is Vmk = {n^)~^ 2/m(i) if t G J|". 

The number of segments is the chosen as 

Kq = argma,xmBICQ{K). 

We then propose the following heuristic: choose the best Q for each K, then select the best K among them: 
Qk = argm^xi3/CK(<3), ^Qk ~ ^i^ST^&^rnBICQ^{K) and Q = Q^- 


5 Simulation study 

In this section, we illustrate the importance of taking the dependency into account and we study the behavior 
of our model selection heuristic for K and Q and its impact on the estimation of all parameters. 

5.1 Simulation design and quality criteria 

Simulation design We consider differents number of series M G {5,10} with different lengths n G {50,100). 
For each series Y'", the number of breakpoints {K — 1) is Poisson distributed with mean k {k = Z for n = 50 and 
k = 5 for n = 100) and their positions are uniformly distributed. The mean value within each segment alternates 
between 0 and values in {—2, —1,+1,+2). We consider different residual standard deviations a G (.2, .5,1} 
and a spatial-type correlation between series: distances d are simulated as distances between Gaussian bivariate 
random vectors and E = ((1 — a)p'^ + uIm) x with a = 0.2 and p G (0.2, 0.8}. The parameter p controls the 
intensity of the dependency between series. Each configuration is simulated 100 times. 

Quality criteria To asses the quality of the estimation of the covariance matrix, we use the root mean 

squared distance between the true parameter and its estimate : RMSE(E) = (^ij ~ ^ij)^ ■ Foi" 

the segmentation parameters, we are interested in the performance of the breakpoint positioning. To measure it, 
we consider both the proportion of erroneously detected breakpoints among detected breakpoints (false positive 
rate, FPR) and the proportion of detected true breakpoints among true breakpoints (true positive rate, TPR). 
A perfect segmentation results in null FPR and TPR equals to 1. For each configuration we consider the average 
of these criteria over the 100 simulations. 

5.2 Results 

Only the results with M = 10 and n = 100 are presented since the results for the other configurations lead to 
same conclusions. In this section, we use notations * for the true parameters and est for the estimated ones on 
the graphs. 
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Figure 1: FPR on the left and TPR on the right for p = 0.2 (top) and p = 0.8 (bottom) using K. We distinguish 
the cases Q = Q (denoted Qest on the graph) and Q = 0 (e.g. the segmentation only). a;-axis: a. 


Accounting for the dependency Figure 1 compares the selected segmentation when the dependency is 
considered {Q = Q) or not ((5 = 0). Whatever the difficulty of the detection problem (different values of cr), 
accounting for the dependency increases the performance of the segmentation (smaller FPR and larger TPR). 
This is particularly marked when the dependency is hight {p = 0.8, bottom of the figure). 

Discussion on the selection of K Figure 2 (top) shows that, whatever the level of the dependency, when 
the noise is small (cr = 0.2), the selected number of segments is close to the true one and the breakpoints are well 
positioned (see Figure 2 (bottom)). When the detection problem gets difficult (cr large), the selection procedure 
tends to underestimate the number of segments leading to a better precision on the breakpoints positioning 
compared to the true number of segments (smaller FPR). This result was expected since in this case one may 
prefer to avoid the detection of false positive breakpoints, as generally observed in segmentation problems. 

Discussion on the selection of Q The number of factors is strongly underestimated (see Table 1) (close to 
1 in average for p = 0.2 and for the different values of cr, not shown) meaning that only few factors are necessary 
to capture the dependency structure. This underestimation does not alter the estimation of S (compared to the 
true number of factors) and increases the power of procedure in terms of breakpoint positionning (in terms of 
FPR and TPR). Moreover, the selected number of factors decreases sligthly with the difficulty of the detection 
problem (with cr) leading to a deacreasing precision of the estimation of E. 

We observe the benefit of the factor model estimate in terms of regularization. 

Confusion between K and E As observed in Figure 2 (bottom), the true number of segments K = K* leads 
to numerous false positive breakpoints. In some sense, K* is too large for the data at hand. The consequence 
of oversegmentation is that no factor are selected for cr = 0.5,1, meaning that the dependency is captured by 
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Figure 2: Top: K — K* obtained with Q = <5 for p = (0.2, 0.8) (denoted r on the graph). Bottom: FPR (left) 
and TPR (right) with Q = Q, p = 0.8 for K = K* and K = K. x-axis: cr. 



{Q,K) 

{Q\'K) 

a 

0.2 

0.5 

1 

0.2 

0.5 

1 

mean of Q 

3.37 

2.74 

2.39 

9 

RMSE(E) 

0.005 

0.034 

0.119 

0.0048 

0.032 

0.124 

FPR 

0.016 

0.110 

0.288 

0.039 

0.175 

0.413 

TPR 

0.93 

0.69 

0.34 

0.93 

0.597 

0.262 


Table 1: Mean of Q, RMSE(E), FPR and TPR for p = 0.8 where Q* is the true number of factors. 






the segmentation (results not shown). 


6 Application 

Data description Scientific Permanent GNSS instruments continuously track electromagnetic signals from 
GPS satellites. Their data are generally computed by scientific, private or public services in near-real time or 
after getting a few days of observations to derive accurate coordinates. These coordinates are used to determine 
precise velocities of points located on the crust that constrain tectonic models and Earth’s crust/mantle param¬ 
eters, to infer mass transfer in the fluid layers of the Earth or for positioning applications (terrestrial reference 
frame). 

Coordinate time series from five GNSS stations have been used in this study. These stations, located in the 
Michigan and Ohio States in USA. They are encoded MPLE, ADRI, BAYR, BRIG and DEEP Only the longitude 
component has been investigated here; from the February 2002 to June 2013, 3776 longitude coordinates are 
available per station. Because they are separated by less than 250 km, their coordinate time series show 
similarities related to common ground deformation and correlated processing errors. The predominant effect 
in this component is the tectonic motion of the North American plate, which is about -16 mm/yr. When 
making difference of coordinate series from close stations, we can hope that this effect is cancelled. Only 
residual differential motion as well as residual noise still remains in addition to sudden changes often related to 
equipment or environmental changes at one of the two stations. 

Working with difference of series allows us to neglect the modelisation of the ground deformation (which can 
be complex, see [4]), that our model does not take into account, as in [2] for climate series. In this example, 
we use station MPLE as the reference series and form four time series from the four other slave series, denoted 
ADRI-MPLE, BAYR-MPLE, BRIG-MPLE and DEFI-MPLE. 


Accounting for dependency For those series, a model where the dependency is considered is prefered 
according to the model selection criterion to the model where it is not the case ((5 = 0). More precisely, our 
method selects one factor (Q = 1) and A = 46 segments. With the segmentation only, the number of segments is 
estimated to be 70, which is significantly larger. Figures 4 and 3 give the four series with the estimated breakpoint 
positions (vertical lines) obtained by taken into account the dependency or not respectively. Accounting for 
the dependency is important to avoid too many false breakpoint detection (the additional breakpoints do not 
correspond to known events, see Table 2). 

Covariance structure The choice Q = 1 seems to be sufficient to capture the dependency structure among 
series. E well captures the spatial dependency between series as shown in Figure 5 which represents the 
estimation of S according to the distance between slave series (available here) and where we observed that the 
larger the distance, the higher the correlation. 

Breakpoint interpretation Concerning the estimated breakpoint positions, we can distinguish: 

• the common breakpoints among almost the four series (coded by the dotted lines (in red) in Figure 4). 
These breakpoints can be due to changes in the reference series MPLE. Among them, the breakpoint at 
day 54228 is known to be due to antenna and receiver changes (see Table 2). 

• the series-specific changes. Some of them are due to known instrumental changes (the dashed lines (in 
black)). Moreover the change at day 53620 in the series 4 can be due to the known change in the series 
MPLE (see Table 2). The others (the solid lines (in black)) remain, up to now, unexplained even if some 
are close to known events. 
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Figure 3: Estimated breakpoints for the 4 series obtained with the segmentation only. 


We also observed that in the series 2 (BAYR-MPLE), the change at day 55077 is not refered as known event but 
seems well marked. Moreover, the series 4 (DEFI-MPLE) is the farest series from MPLE. This can explained 
the periodic behavior observed in the series of difference (the tectonic motion part is not completly corrected) 
and so a large number of estimated breakpoints is needed to ht it. 
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Figure 4: Estimated breakpoints for the 4 series when the dependency is taken into account. Dotted lines (in 
red): common breakpoints among almost the 4 series. Solid line (in blue): series-specific breakpoints. Dashed 
lines (in black): known series-specific breakpoints. 
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Serie 

date 

time in week 

description of the change 

ADRI 

2004-12-02 

53341 

receiver 

ADRI 

2005-09-15 

53628 

receiver 

ADRI 

2006-08-02 

53949 

receiver 

BAYR 

2004-12-02 

53341 

receiver 

BAYR 

2005-08-31 

53613 

receiver 

BAYR 

2006-07-11 

53927 

receiver 

BRIG 

2004-12-02 

53341 

receiver 

BRIG 

2005-09-12 

53625 

receiver 

BRIG 

2008-04-14 

54570 

antenna 

DEFI 

2006-10-11 

54019 

antenna 

DEFI 

2009-04-07 

54928 

receiver 

DEFI 

2011-01-26 

55587 

receiver 

DEFI 

2012-08-02 

56141 

antenna 

DEFI 

2012-11-05 

56236 

antenna 

MPLE 

2004-12-02 

53341 

receiver 

MPLE 

2005-09-14 

53627 

receiver 

MPLE 

2006-12-28 

54097 

receiver 

MPLE 

2007-05-08 

54228 

antenna and receiver 

MPLE 

2007-10-05 

54378 

receiver 


Table 2: Known changes in the five series. 


n3 

E 

O) 

OD 


CM 

O 

o 


00 

o 

o 

o 


o 

o 

o 


o 

o 

p 

o 


0.0 0.5 1.0 1.5 2.0 


D 


Figure 5: Estimation of E according to the distance between series. 
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